In this analysis, I looked to see what were some of the common trends in regard to amounts of donations and total money donated based on other factors in the data.
First, I wanted to get a basic understanding of my dataset:
* How many observations am I working with?
* What types of data do I have and what are some basic summaries about this data?
## [1] 908181 23
## [1] "committee.id" "candidate.id" "candidate"
## [4] "name" "city" "state"
## [7] "zip" "employer" "occupation"
## [10] "amount" "date" "receipt.description"
## [13] "memo.code" "memo.text" "form.type"
## [16] "file.number" "transaction.id" "election.type"
## [19] "democrat" "democrat.amount" "republican"
## [22] "republican.amount" "all.donations"
## 'data.frame': 908181 obs. of 23 variables:
## $ committee.id : Factor w/ 15 levels "C00410118","C00431171",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ candidate.id : Factor w/ 14 levels "P00003608","P20002523",..: 13 13 13 13 13 13 13 13 13 13 ...
## $ candidate : Factor w/ 2 levels "Obama, Barack",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ name : Factor w/ 199431 levels "..., ANTHONY",..: 15658 86905 15322 173182 123110 121194 117757 22526 101066 181746 ...
## $ city : Factor w/ 2497 levels "","@GMAIL.COM",..: 2011 1116 1766 495 2275 566 1983 1134 1574 2024 ...
## $ state : Factor w/ 1 level "CA": 1 1 1 1 1 1 1 1 1 1 ...
## $ zip : Factor w/ 115949 levels "","*9040","*9136",..: 98439 17525 63152 9576 15040 19192 41578 67294 62008 13821 ...
## $ employer : Factor w/ 68255 levels ""," FAIRCHILD SEMI",..: 48612 48612 48612 5206 13650 48612 41612 23615 48612 53064 ...
## $ occupation : Factor w/ 31407 levels ""," BUILDING TECH",..: 23369 23369 23369 18321 26086 23369 23369 15850 23369 31253 ...
## $ amount : num 10 500 25 30 100 112 250 500 25 250 ...
## $ date : Date, format: "2011-09-27" "2011-08-29" ...
## $ receipt.description: Factor w/ 37 levels "","ATTRIBUTION TO PARTNERS REQUESTED / REDESIGNATION REQUESTED",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ memo.code : Factor w/ 2 levels "","X": 1 1 1 1 1 1 1 1 1 1 ...
## $ memo.text : Factor w/ 261 levels "","*","* EARMARKED CONTRIBUTION: SEE BELOW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ form.type : Factor w/ 3 levels "SA17A","SA18",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ file.number : int 756218 756218 756218 756218 756218 756218 756218 756218 756218 756218 ...
## $ transaction.id : Factor w/ 870697 levels "0000002","0000004-0001",..: 41504 37447 40383 42192 40782 34292 37792 35669 40214 41799 ...
## $ election.type : Factor w/ 8 levels "","G2008","G2012",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ democrat : num 1 1 1 1 1 1 1 1 1 1 ...
## $ democrat.amount : num 10 500 25 30 100 112 250 500 25 250 ...
## $ republican : num 0 0 0 0 0 0 0 0 0 0 ...
## $ republican.amount : num 0 0 0 0 0 0 0 0 0 0 ...
## $ all.donations : num 1 1 1 1 1 1 1 1 1 1 ...
## committee.id candidate.id candidate
## C00431445:717147 P80003338:717147 Obama, Barack:717147
## C00431171:191034 P80003353:191034 Romney, Mitt :191034
## C00410118: 0 P00003608: 0
## C00493692: 0 P20002523: 0
## C00494393: 0 P20002556: 0
## C00495622: 0 P20002671: 0
## (Other) : 0 (Other) : 0
## name city state
## HENRY, MICHELLE : 272 LOS ANGELES : 68063 CA:908181
## GROVER-MCKAY, MALEAH : 245 SAN FRANCISCO: 57619
## BRYANT, REGINA : 223 SAN DIEGO : 35024
## RACCIO, KYLE : 178 OAKLAND : 23733
## BEEGHLY, CHRISTINA D. MS.: 167 SAN JOSE : 18159
## REYNOLDS, MARK : 166 BERKELEY : 16876
## (Other) :906930 (Other) :688707
## zip employer
## 94114 : 2792 RETIRED :182245
## 94110 : 2515 SELF-EMPLOYED :131549
## 94611 : 2180 NOT EMPLOYED : 66741
## 94117 : 2112 INFORMATION REQUESTED PER BEST EFFORTS: 19852
## 90046 : 1676 INFORMATION REQUESTED : 19138
## 94941 : 1622 (Other) :488441
## (Other):895284 NA's : 215
## occupation amount
## RETIRED :203434 Min. : 0.09
## ATTORNEY : 30840 1st Qu.: 25.00
## HOMEMAKER : 21084 Median : 50.00
## INFORMATION REQUESTED PER BEST EFFORTS: 18989 Mean : 190.16
## TEACHER : 18464 3rd Qu.: 100.00
## (Other) :615313 Max. :30000.00
## NA's : 57
## date receipt.description
## Min. :2011-04-04 :906365
## 1st Qu.:2012-07-16 REDESIGNATION FROM PRIMARY : 647
## Median :2012-09-17 REATTRIBUTION / REDESIGNATION REQUESTED: 376
## Mean :2012-08-10 REATTRIBUTION FROM SPOUSE : 328
## 3rd Qu.:2012-10-17 SEE REATTRIBUTION : 227
## Max. :2012-12-31 REDESIGNATION FROM GENERAL : 194
## (Other) : 44
## memo.code memo.text form.type
## :692330 :688882 SA17A:693673
## X:215851 * OBAMA VICTORY FUND 2012 :135822 SA18 :214508
## TRANSFER FROM ROMNEY VICTORY INC. : 77912 SB28A: 0
## * EARMARKED CONTRIBUTION: SEE BELOW: 1189
## * : 665
## REDESIGNATION FROM PRIMARY : 647
## (Other) : 3064
## file.number transaction.id election.type democrat
## Min. :756214 C19560355 : 2 G2012 :529269 Min. :0.0000
## 1st Qu.:810684 SA17.1000048: 2 P2012 :378261 1st Qu.:1.0000
## Median :821325 SA17.1000101: 2 O2012 : 651 Median :1.0000
## Mean :830860 SA17.1000103: 2 : 0 Mean :0.7897
## 3rd Qu.:842943 SA17.1000138: 2 G2008 : 0 3rd Qu.:1.0000
## Max. :944828 SA17.1000144: 2 P : 0 Max. :1.0000
## (Other) :908169 (Other): 0
## democrat.amount republican republican.amount all.donations
## Min. : 0.0 Min. :0.0000 Min. : 0.00 Min. :1
## 1st Qu.: 10.0 1st Qu.:0.0000 1st Qu.: 0.00 1st Qu.:1
## Median : 35.0 Median :0.0000 Median : 0.00 Median :1
## Mean : 103.1 Mean :0.2103 Mean : 87.05 Mean :1
## 3rd Qu.: 100.0 3rd Qu.:0.0000 3rd Qu.: 0.00 3rd Qu.:1
## Max. :25800.0 Max. :1.0000 Max. :30000.00 Max. :1
##
Based on this data summary alone, there is a lot of information about the dataset.
* Most donations go to Obama.
* Not surprisingly, the cities that send the most donations are the most populated cities.
* Many are retired or self-employed (more on this later).
* Of those employed by others, the most common occupation is attorney.
* The median amount of donation is $50 with a right skewed distribution.
* The median date is Oct 14th, 2012. Thus, 50% of all donations occur in the last month leading up to the election on Nov 6th, 2012.
Next, I wanted to see what are the most common values of my categorical variables.
Nothing too suprising here. Generally, the cities and zips with the most donations tend to be the ones with the most amount of people.
At first glance, this histogram seems strange. Do retired people really donate that much more often than everyone else? There is some credence to this, as older citizens tend to be the most politically active in terms of voting. However, this trend is better explained by the fact that retired people are probably not that different than other contributors, except for the fact that they are older and no longer work. In other words, ex-teachers and ex-attorneys are all lumped into one category, “retired”. Thus, in reviewing the most commonly held occupations, it makes more sense to exclude retired people. Additionally, let’s remove those who would not disclose their occupation.
Even from these results, it’s still hard to draw any particular conclusions. We’d need data on the distribution of occupations in California to tell whether these particular occupations are more likely to contribute, or are just the most common amongst the population at large.
A few types of “employers” skew the distribution a bit. Let’s remove non-organizational employers…
The most common employers tend to be the largest employers within the state of California. Again, we’d need to know more information about employer distributions to know if it any particular employer over or underindexes on rate of contribution.
In analyzing amount, I wanted to know what the distributions looked like how they varied between parties.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.09 25.00 50.00 190.20 100.00 30000.00
Clearly, there are some large outliers in the values of amounts. Given that the initial histogram was right skewed, I first decided to take a log transformation to see whether it would then follow a normal distribution. It does look more normal, but not quite as perfectly as one might hope.
Next, I asked, is there a difference in distributions in donations by party?
Based on these graphs alone, it’s difficult to tell to precise details. Instead, I broke these out with boxplots and summary statistics.
## $`Obama, Barack`
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.09 20.00 50.00 130.60 100.00 25800.00
##
## $`Romney, Mitt`
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 50.0 100.0 413.9 285.4 30000.0
After zooming in our graph, it becomes very clear that the distribution of amounts is noticeably different between donations to Obama vs. Romney. The stark differences can be seen exactly in the numerical summary above. This would explain why although Romney has far fewer total contributors, his total amount of money raised is not as comparably low.
## Obama, Barack Romney, Mitt
## 93638609 79060174
There are a few noticeable trends in occupation data. First is that expected occupation income seems to related with amount donated. For example, physicians and attorneys tend to donate more per person than teachers. The exception to this would be homeakers, but this makes sense upon further consideration. If you’re a homemaker, your spouse probably must make enough money to support the entire household. Thus, households with a homemaker may have higher incomes on average, and thus be able to donate money from both partners at higher amounts.
It is difficult to comment on the frequency of attorneys and physicians without outside data outlining their relative frequencies in the California population, but I’m willing to guess that they donate more often than average because specific political issues are likely to be very relevant to them given their field of work.
## Warning: Removed 7 rows containing missing values (position_stack).
Once agian, we see that candidate has a strong relationship across the top occupations sampled. In these charts, we also see that professors tend to lean more towards donating to the democratic candidate, whereas occupations related to business functions (sales, manager, president, real estate, etc.) tend to lean toward donating to the republican canidate.
Given what we know about amounts across all of California, I looked to explore city-level data.
Some of this chart is as expected. The largest cities like Los Angeles and San Francisco donate the most money just through their sheer population size. But some smaller cities also make this list. Let’s separate out the effects of population by looking at boxplots of the amounts by cities…
Clearly, some cities (like Newport Beach) donate at a rate far above those of other cities, hence why it makes the list despite only having a population of rouhly 85,000 (as opposed to Sacramento, population of approximately 480,000).
Next, I asked: What are the most partisan cities in terms of dollars donated and number of donations?
Clearly, city and candidate can have a strong relationship to amount.
Finally, I wanted to look at: what are the most partisan cities by percentage of contributions and by percentage of total money contributed?
Note that for the analysis on cities above, a mimimum threshold was set at 50 donations for a city to be included in partisan city analysis.
Based on city data, it’s clear that Republicans do best in small, affluent communities in Southern California. Democrats do best in Berkeley and small cities nearby in the Bay Area. A notable exception to this trend is Hollywood, a Democrat stronghold.